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Abstract 

Trigrams'n'Tags (TnT) is an efficient statistical 
part-of-speech tagger. Contrary to claims found 
elsewhere in the literature, we argue that a tagger 
based on Markov models performs at least as well as 
other current approaches, including the Maximum 
Entropy framework. A recent comparison has even 
shown that TnT performs significantly better for the 
tested corpora. We describe the basic model of TnT, 
the techniques used for smoothing and for handling 
unknown words. Furthermore, we present evalua- 
tions on two corpora. 



1 Introduction 

A large number of current language processing sys- 
tems use a part-of-speech tagger for pre-processing. 
The tagger assigns a (unique or ambiguous) part-of- 
speech tag to each token in the input and passes its 
output to the next processing level, usually a parser. 
Furthermore, there is a large interest in part-of- 
speech tagging for corpus annotation projects, who 
create valuable linguistic resources by a combination 
of automatic processing and human correction. 

For both applications, a tagger with the highest 
possible accuracy is required. The debate about 
which paradigm solves the part-of-speech tagging 
problem best is not finished. Recent compari sons 



Halteren et al., 1995 ; 


Volk and Schneider, 1998|) have 


shown that in most cases statistical aproaches (|Cut- 


ting et al., 1992|; Bchmid, 1995; 


Ratnaparkhi, 199£) 



yield better results th an finite-state, rule-based, or 
mem ory-based taggers (Brill, 1993; Daelemans et al. J 
1996| ). They are only surpassed by combinations of 



different systems, forming a "voting tagger" . 

Among the statistical approaches, the Maximum 
Entropy framework has a very strong position. Nev- 
erthe less, a recent independent com parison of 7 tag- 
gers (Zavrcl and Daelemans, 1999) has shown that 
another approach even works better: Markov mod- 
els combined with a good smoothing technique and 
with handling of unknown words. This tagger, TnT, 
not only yielded the highest accuracy, it also was the 
fastest both in training and tagging. 



The tagger comparison was organized as a "black- 
box test": set the same task to every tagger and 
compare the outcomes. This paper describes the 
models and techniques used by TnT together with 
the implementation. 

The reader will be surprised how simple the under- 
lying model is. The result of the tagger comparison 
seems to support the maxime "the simplest is the 
best" . However, in this paper we clarify a number 
of details that are omitted in major previous pub- 
lications concerning tagging with Marko v models. 
As two exa mples, (Rabiner, 1989) and (Charniak 



et al., 1993| ) give good overviews of the techniques 
and equations used for Markov models and part-of- 
speech tagging, but they are not very explicit in the 
details that are needed for their application. We ar- 
gue that it is not only the choice of the general model 
that determines the result of the tagger but also the 
various "small" decisions on alternatives. 

The aim of this paper is to give a detailed ac- 
count of the techniques used in TnT. Additionally, 
we present results of the tagger on the NEGRA cor- 



pus flBrants et al., 1999) and the Penn Treebank 



(Marcus et al., 1993). The Penn Treebank results 
reported here for the Markov model approach are 
at least equivalent to thos e reported for the M axi- 
mum Entropy approach in (Ratnaparkhi, 1996). For 
a c omparison to other taggers, th e reader is referred 
to (Zavrel and Daelemans, 1999). 



2 Architecture 

2.1 The Underlying Model 

TnT uses second order Markov models for part-of- 
speech tagging. The states of the model represent 
tags, outputs represent the words. Transition prob- 
abilities depend on the states, thus pairs of tags. 
Output probabilities only depend on the most re- 
cent category. To be explicit, we calculate 



argmax 
t x ...t T 



np(u\u- u t^ 2 )p( Wl \u 



p(t T +i\t T ) 



(i) 

for a given sequence of words w± . . . wt of length T. 
t\. .At are elements of the tagset, the additional 



tags t_i, to, and tr+i are beginning-of-sequcncc 
and end-of-sequence markers. Using these additional 
tags, even if they stem from rudimentary process- 
ing of punctuation marks, slightly improves tagging 
results. This is different from formulas presented 
in other publications, which just stop with a "loose 
end" at the last word. If sentence boundaries are 
not marked in the input, TnT adds these tags if it 
encounters one of [.!?;] as a token. 

Transition and output probabilities are estimated 
from a tagged corpus. As a first step, we use the 
maximum likelihood probabilities P which are de- 
rived from the relative frequencies: 



Unigrams: 
Bigrams: 

Trigrams: 

Lexical: 



P(h) = 
P(t 3 \t 2 ) 



N 
f(t 2 ,t 3 ) 



P(t 3 \h,t 2 ) = 

P(w 3 \t 3 ) 



f(h,t 2 ,t 3 ) 



/(*i,*a) 
f(w 3 ,t 3 ) 

/(is) 



(2) 
(3) 

(4) 

(5) 



for all t\, t 2 , t 3 in the tagset and w 3 in the lexi- 
con. N is the total number of tokens in the training 
corpus. We define a maximum likelihood probabil- 
ity to be zero if the corresponding nominators and 
denominators are zero. As a second step, contex- 
tual frequencies are smoothed and lexical frequences 
are completed by handling words that are not in the 
lexicon (see below). 

2.2 Smoothing 

Trigram probabilities generated from a corpus usu- 
ally cannot directly be used because of the sparse- 
data problem. This means that there are not enough 
instances for each trigram to reliably estimate the 
probability. Furthermore, setting a probability to 
zero because the corresponding trigram never oc- 
cured in the corpus has an undesired effect. It causes 
the probability of a complete sequence to be set to 
zero if its use is necessary for a new text sequence, 
thus makes it impossible to rank different sequences 
containing a zero probability. 

The smoothing paradigm that delivers the best 
results in TnT is linear interpolation of unigrams, 
bigrams, and trigrams. Therefore, we estimate a 
trigram probability as follows: 

P{t 3 \h,t 2 ) = \iP{t 3 ) + X 2 P{t 3 \t 2 ) + A3-P(t 3 |ii,ia) 

(6) 

P are maximum likelihood estimates of the proba- 
bilities, and Ai + A2 + A3 = 1, so P again represent 
probability distributions. 

We use the context-independent variant of linear 
interpolation, i.e., the values of the As do not depend 
on the particular trigram. Contrary to intuition, 



this yields better results than the context-dependent 
variant. Due to sparse-data problems, one cannot es- 
timate a different set of As for each trigram. There- 
fore, it is common practice to group trigrams by fre- 
quency and estimate tied sets of As. However, we 
are not aware of any publication that has investi- 
gated frequency groupings for linear interpolation in 
part-of-speech tagging. All groupings that we have 
tested yielded at most equivalent results to context- 
independent linear interpolation. Some groupings 
even yielded worse results. The tested groupings 
included a) one set of As for each frequency value 
and b) two classes (low and high frequency) on the 
two ends of the scale, as well as several groupings 
in between and several settings for partitioning the 
classes. 

The values of Ai, A2, and A3 are estimated by 
deleted interpolation. This technique successively 
removes each trigram from the training corpus and 
estimates best values for the As from all other n- 
grams in the corpus. Given the frequency counts 
for uni-, bi-, and trigrams, the weights can be very 
efficiently determined with a processing time linear 
in the number of different trigrams. The algorithm 
is given in figure [j]. Note that subtracting I means 
taking unseen data into account. Without this sub- 
traction the model would overfit the training data 
and would generally yield worse results. 

2.3 Handling of Unknown Words 

Currently, the method of handling unknown words 
that seems to work best for in flected languages i s 
a suffix analysis as proposed in (Bamuelsson, 1993). 
Tag probabilities are set according to the word's end- 
ing. The suffix is a strong predictor for word classes, 
e.g., words in the Wall Street Journal part of the 
Penn Treebank ending in able are adjectives (JJ) in 
98% of the cases (e.g. fashionable, variable) , the rest 
of 2% are nouns (e.g. cable, variable). 

The probability distribution for a particular suf- 
fix is generated from all words in the training set 
that share the same suffix of some predefined max- 
imum length. The term suffix as used here means 
"final sequence of characters of a word" which is not 
necessarily a linguistically meaningful suffix. 

Probabilities are smoothed by successive abstrac- 
tion. This calculates the probability of a tag t 
given the last m letters U of an n letter word: 
P(t|£ n _ OT _l_i, . . . l n ). The sequence of increasingly 
more general contexts omits more and more char- 
acters of the suffix, such that P(i|Z n _ m -|_2, . . . , l n ), 
P(i|Z„_ m+3 , . . . , l n ), . . . , P(t) are used for smooth- 
ing. The recursion formula is 



P{t\ln-i+l, ■ ■ ■ In) 

P{t\l n - i+l ,...l n ) + 6iP{t\l 



n—n ■ ■ ■ 1 



In) 



(7) 



set Ai = A2 = A3 = 

foreach trigram ii,t2,i3 with f{t\,t2,t^) > 

depending on the maximum of the following three values: 



/(*l,*2,t 3 )-l 

/(*i,ta)-l ' 

/(t2,t 3 )-l 



/(ta)-l 
case : 

end 

end 

normalize Ai,A2,A3 



increment A3 by f(ti,t2,t$) 
increment A2 by f{t\,t2,t 3 ) 



increment Ai by /(ti,t2)*3) 



Figure 1: Algorithm for calculting the weights for context-independent linear interpolation Ai,A2,A3 when 
the n-gram frequencies are known. N is the size of the corpus. If the denominator in one of the expressions 
is 0, we define the result of that expression to be 0. 



for i = m . . . 0, using the maximum likelihood esti- 
mates P from frequencies in the lexicon, weights 8i 
and the initialization 



P(t)=P(t). 



(8) 



The maximum likelihood estimate for a suffix of 
length i is derived from corpus frequencies by 



P(t\l 



n— i+1 ) ■ 



■L) 



f(t, ln-i+1, ■ ■ ■ In) 



/On- 



■ In) 



(9) 



For the Markov model, we need the inverse condi- 
tional probabilities P(l n -i+i, ■ ■ ■ l n \t) which are ob- 
tained by Bayesian inversion. 

A theoretical motivated argumentation uses the 
standard deviation of the maximum likelihood prob- 
abilities for the weights 9i (Samuelsson, 1993). 

This leaves room for interpretation. 

1) One has to identify a good value for m, the 
longest suffix used. The approach taken for TnT is 
the following: to depends on the word in question. 
We use the longest suffix that we can find in the 
training set (i.e., for which the frequency is greater 
than or equal to 1), but at most 10 characters. This 
is an empirically determined choice. 

2) We use a context-independent approach for $i, 
as we did for the contextual weights Aj. It turned 
out to be a good choice to set all 0i to the standard 
deviation of the unconditioned maximum likelihood 
probabilities of the tags in the training corpus, i.e., 
we set 



9 i = ^^ l (P(t j )-P) 1 



(10) 



for all i = ... to — 1, using a tagset of s tags and 
the average 



(11) 



This usually yields values in the range 0.03 . . . 0.10. 

3) We use different estimates for uppercase and 
lowercase words, i.e., we maintain two different suffix 
tries depending on the capitalization of the word. 
This information improves the tagging results. 

4) Another freedom concerns the choice of the 
words in the lexicon that should be used for suf- 
fix handling. Should we use all words, or are some 
of them better suited than others? Accepting that 
unknown words are most probably infrequent, one 
can argue that using suffixes of infrequent words in 
the lexicon is a better approximation for unknown 
words than using suffixes of frequent words. There- 
fore, we restrict the procedure of suffix handling to 
words with a frequency smaller than or equal to some 
threshold value. Empirically, 10 turned out to be a 
good choice for this threshold. 

2.4 Capitalization 

Additional information that turned out to be use- 
ful for the disambiguation process for several cor- 
pora and tagsets is capitalization information. Tags 
are usually not informative about capitalization, but 
probability distributions of tags around capitalized 
words are different from those not capitalized. The 
effect is larger for English, which only capitalizes 
proper names, and smaller for German, which capi- 
talizes all nouns. 

We use flags Cj that are true if Wi is a capitalized 
word and false otherwise. These flags are added to 
the contextual probability distributions. Instead of 



P(ta\h,t 2 ) 



(12) 



we use 



P(t 3 ,C3\tl,Ci,t 2 ,C 2 ) (13) 

and equations (||) to (||) are updated accordingly. 
This is equivalent to doubling the size of the tagset 
and using different tags depending on capitalization. 



2.5 Beam Search 



The process ing time of the Viterbi algorithm (Ra- 



biner, 1989 ) can be reduced by introducing a beam 
search. Each state that receives a 5 value smaller 
than the largest S divided by some threshold value 
9 is excluded from further processing. While the 
Viterbi algorithm is guaranteed to find the sequence 
of states with the highest probability, this is no 
longer true when beam search is added. Neverthe- 
less, for practical purposes and the right choice of 
9, there is virtually no difference between the algo- 
rithm with and without a beam. Empirically, a value 
of 9 = 1000 turned out to approximately double the 
speed of the tagger without affecting the accuracy. 

The tagger currently tags between 30,000 and 
60,000 tokens per second (including file I/O) on a 
Pentium 500 running Linux. The speed mainly de- 
pends on the percentage of unknown words and on 
the average ambiguity rate. 

3 Evaluation 

We evaluate the tagger's performance under several 
aspects. First of all, we determine the tagging ac- 
curacy averaged over ten iterations. The overall ac- 
curacy, as well as separate accuracies for known and 
unknown words are measured. 

Second, learning curves are presented, that indi- 
cate the performance when using training corpora of 
different sizes, starting with as few as 1,000 tokens 
and ranging to the size of the entire corpus (minus 
the test set). 

An important characteristic of statistical taggers 
is that they not only assign tags to words but also 
probabilities in order to rank different assignments. 
We distinguish reliable from unreliable assignments 
by the quotient of the best and second best assign- 
ments^. All assignments for which this quotient is 
larger than some threshold are regarded as reliable, 
the others as unreliable. As we will see below, accu- 
racies for reliable assignments are much higher. 

The tests are performed on partitions of the cor- 
pora that use 90% as training set and 10% as test 
set, so that the test data is guaranteed to be unseen 
during training. Each result is obtained by repeat- 
ing the experiment 10 times with different partitions 
and averaging the single outcomes. 

In all experiments, contiguous test sets are used. 
The alternative is a round-robin procedure that puts 
every 10th sentence into the test set. We argue that 
contiguous test sets yield more realistic results be- 
cause completely unseen articles are tagged. Using 
the round-robin procedure, parts of an article are al- 
ready seen, which significantly reduces the percent- 
age of unknown words. Therefore, we expect even 



higher results when testing on every 10th sentence 
instead of a contiguous set of 10%. 

In the following, accuracy denotes the number of 
correctly assigned tags divided by the number of to- 
kens in the corpus processed. The tagger is allowed 
to assign exactly one tag to each token. 

We distinguish the overall accuracy, taking into 
account all tokens in the test corpus, and separate 
accuracies for known and unknown tokens. The lat- 
ter are interesting, since usually unknown tokens are 
much more difficult to process than known tokens, 
for which a list of valid tags can be found in the 
lexicon. 

3.1 Tagging the NEGRA corpus 

The German NEGRA corpus consists of 20,000 sen- 
tences (355,000 tokens) of newspaper texts (Frank- 
furter Rundschau) that are annotated with parts-of- 
speech and predicate-argument structures ( |Skut ct 
al., 19971 ). It was developed at the Saarland Univer- 
sity in SaarbriickenQ. Part of it was tagged at the 
IMS Stuttgart. This evaluation only uses the part- 
of-speech annotation and ignores structural annota- 
tions. 

Tagging accuracies for the NEGRA corpus are 
shown in table || 

Figure || shows the learning curve of the tagger, 
i.e., the accuracy depending on the amount of train- 
ing data. Training length is the number of tokens 
used for training. Each training length was tested 
ten times, training and test sets were randomly cho- 
sen and disjoint, results were averaged. The training 
length is given on a logarithmic scale. 

It is remarkable that tagging accuracy for known 
words is very high even for very small training cor- 
pora. This means that we have a good chance of 
getting the right tag if a word is seen at least once 
during training. Average percentages of unknown 
tokens are shown in the bottom line of each diagram. 

We exploit the fact that the tagger not only de- 
termines tags, but also assigns probabilities. If there 
is an alternative that has a probability "close to" 
that of the best assignment, this alternative can be 
viewed as almost equally well suited. The notion of 
"close to" is expressed by the distance of probabil- 
ities, and this in turn is expressed by the quotient 
of probabilities. So, the distance of the probabili- 
ties of a best tag tbest and an alternative tag t a it 
is expressed by p(tbest)/p(t a it), which is some value 
greater or equal to 1 since the best tag assignment 
has the highest probability. 

Figure ^ shows the accuracy when separating as- 
signments with quotients larger and smaller than 
the threshold (hence reliable and unreliable assign- 
ments). As expected, we find that accuracies for 



1 By definition, this quotient is oo if there is only one pos- 
sible tag for a given word. 



. 2 For availability, please chevk 

littp : //www. coli .uni-sb . de/sf b378/negra-corpus 



Table 2: Part-of-speech tagging accuracy for the NEGRA corpus, averaged over 10 test runs, training and 
test set are disjoint. The table shows the percentage of unknown tokens, separate accuracies and standard 
deviations for known and unknown tokens, as well as the overall accuracy. 





percentage 


known 


unknown 


overall 




unknowns 


acc. a 


acc. a 


acc. a 


NEGRA corpus 


11.9% 


97.7% 0.23 


89.0% 0.72 


96.7% 0.29 



NEGRA Corpus: POS Learning Curve 




Overall 
min =78.1% 
max=96.7% 
Known 

min =95.7% 
max=97.7% 
Unknown 

min =61.2% 
max=89.0% 



1 2 5 10 20 50 100 200 320 500 1000 x 1000 Training Length 

50.8 46.4 41.4 36.0 30.7 23.0 18.3 14.3 119 10.3 8.4 avg. percentage unknown 



Figure 3: Learning curve for tagging the NEGRA corpus. The training sets of variable sizes as well as test 
sets of 30,000 tokens were randomly chosen. Training and test sets were disjoint, the procedure was repeated 
10 times and results were averaged. Percentages of unknowns for 500k and 1000k training are determined 
from an untagged extension. 



NEGRA Corpus: Accuracy of reliable assignments 




» 



n 1 — 

50 100 



"T 



T 



T 



"T 



T 



1 



Reliable 
min =96.7% 
max=99.4% 



500 2000 

100 97.9 95.1 92.7 90.3 86.8 84.1 81.0 76.1 71.9 68.3 

53.5 62.9 69.6 74.5 79.8 82.7 85.2 88.0 89.6 90.8 91.8 92.2 acc. of complement 



10000 threshold 9 
64.1 62.0 % cases reliable 



Figure 4: Tagging accuracy for the NEGRA corpus when separating reliable and unreliable assignments. The 
curve shows accuracies for reliable assignments. The numbers at the bottom line indicate the percentage of 
reliable assignments and the accuracy of the complement set (i.e., unreliable assignments). 



Table 5: Part-of-speech tagging accuracy for the Penn Treebank. The table shows the percentage of unknown 
tokens, separate accuracies and standard deviations for known and unknown tokens, as well as the overall 
accuracy. 





percentage 


known 


unknown 


overall 




unknowns 


acc. a 


acc. a 


acc. a 


Penn Treebank 


2.9% 


97.0% 0.15 


85.5% 0.69 


96.7% 0.15 



Penn Treebank: POS Learning Curve 




Overall 
min =78.6% 
max=96.7% 

Known 
min =95.2% 
max=97.0% 

Unknown 
min =62.2% 
max=85.5% 



500 1000 x 1000 Training Length 
4.4 2.9 avg. percentage unknown 



Figure 6: Learning curve for tagging the Penn Treebank. The training sets of variable sizes as well as test sets 
of 100,000 tokens were randomly chosen. Training and test sets were disjoint, the procedure was repeated 
10 times and results were averaged. 



Penn Treebank: Accuracy of reliable assignments 




1 2 5 10 20 50 100 500 2000 10000 threshold 9 

100 97.7 94.6 92.2 89.8 86.3 83.5 80.4 76.6 73.8 71.0 67.2 64.5 % cases reliable 
53.5 62.8 68.9 73.9 79.3 82.6 85.2 87.5 88.8 89.8 91.0 91.6 acc. of complement 



Figure 7: Tagging accuracy for the Penn Treebank when separating reliable and unreliable assignments. The 
curve shows accuracies for reliable assignments. The numbers at the bottom line indicate the percentage of 
reliable assignments and the accuracy of the complement set. 



reliable assignments are much higher than for unre- 
liable assignments. This distinction is, e.g., useful 
for annotation projects during the cleaning process, 
or during pre-processing, so the tagger can emit mul- 
tiple tags if the best tag is classified as unreliable. 

3.2 Tagging the Penn Treebank 

We use the Wall Street Journal as contained in the 
Penn Treebank for our experiments. The annotation 
consists of four parts: 1) a context-free structure 
augmented with traces to mark movement and dis- 
continuous constituents, 2) phrasal categories that 
are annotated as node labels, 3) a small set of gram- 
matical functions that are annotated as exten sions to 
the node la bels, and 4) part-of-speech tags ( Marcus] 
et al., 1993). This evaluation only uses the part-of- 







speech annotation. 

The Wall Street Journal part of the Penn Tree- 
bank consists of approx. 50,000 sentences (1.2 mil- 
lion tokens). 

Tagging accuracies for the Penn Treebank are 
shown in table |^. Figure ^ shows the learning curve 
of the tagger, i.e., the accuracy depending on the 
amount of training data. Training length is the num- 
ber of tokens used for training. Each training length 
was tested ten times. Training and test sets were 
disjoint, results are averaged. The training length is 
given on a logarithmic scale. As for the NEGRA cor- 
pus, tagging accuracy is very high for known tokens 
even with small amounts of training data. 

We exploit the fact that the tagger not only de- 
termines tags, but also assigns probabilities. Figure 
shows the accuracy when separating assignments 
with quotients larger and smaller than the threshold 
(hence reliable and unreliable assignments). Again, 
we find that accuracies for reliable assignments are 
much higher than for unreliable assignments. 



3.3 Summary of Part-of-Speech Tagging 
Results 

Average part-of-speech tagging accuracy is between 
96% and 97%, depending on language and tagset, 
which is at least on a par with state-of-the-art re- 
sults found in the li terature, possibly b etter. For 
the Penn Treebank, (Elatnaparkhi, 1996) reports an 
accuracy of 96.6% using the Maximum Entropy ap- 
proach, our much simpler and therefore faster HMM 
approach delivers 96.7%. This comparison needs to 
be re-examined, since we use a ten-fold crossvalida- 
tion and averaging of results while Ratnaparkhi only 
makes one test run. 

The accuracy for known tokens is significantly 
higher than for unknown tokens. For the German 
newspaper data, results are 8.7% better when the 
word was seen before and therefore is in the lexicon, 
than when it was not seen before (97.7% vs. 89.0%). 
Accuracy for known tokens is high even with very 



small amounts of training data. As few as 1000 to- 
kens are sufficient to achieve 95%-96% accuracy for 
them. It is important for the tagger to have seen a 
word at least once during training. 

Stochastic taggers assign probabilities to tags. We 
exploit the probabilities to determine reliability of 
assignments. For a subset that is determined during 
processing by the tagger we achieve accuracy rates 
of over 99%. The accuracy of the complement set is 
much lower. This information can, e.g., be exploited 
in an annotation project to give an additional treat- 
ment to the unreliable assignments, or to pass se- 
lected ambiguities to a subsequent processing step. 

4 Conclusion 

We have shown that a tagger based on Markov mod- 
els yields state-of-the-art results, despite contrary 
claims found in the literature. For example, the 
Markov model tagger used in the comparison of ( van 



Halteren et al., 1998) yielded worse results than all 



other taggers. In our opinion, a reason for the wrong 
claim is that the basic algorithms leave several deci- 
sions to the implementor. The rather large amount 
of freedom was not handled in detail in previous pub- 
lications: handling of start- and end-of-sequence, the 
exact smoothing technique, how to determine the 
weights for context probabilities, details on handling 
unknown words, and how to determine the weights 
for unknown words. Note that the decisions we made 
yield good results for both the German and the En- 
glish Corpus. They do so for several other corpora 
as well. The architecture remains applicable to a 
large variety of languages. 

According to curr ent tagger comparisons ( van 
Halteren et al., 199? ; Zavrel and Daelcmans, 1999), 



and according to a compa rsion of the results pre- 
sented here with those in ( Ratnaparkhi, 1996 ), the 
Maximum Entropy framework seems to be the only 
other approach yielding comparable results to the 
one presented here. It is a very interesting future 
research topic to determine the advantages of either 
of these approaches, to find the reason for their high 
accuracies, and to find a good combination of both. 

TnT is freely available to universities and re- 
lated o rganizations for research purposes (see 



tittp : //www . coli .uni-sb . de/~thorsten/tnt ) 
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